Total Reward Stochastic Games and Sensitive Average Reward Strategies

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Average Reward Timed Games

We consider real-time games where the goal consists, for each player, in maximizing the average reward he or she receives per time unit. We consider zero-sum rewards, so that a reward of +r to one player corresponds to a reward of −r to the other player. The games are played on discrete-time game structures which can be specified using a two-player version of timed automata whose locations are ...

متن کامل

Learning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games

A large class of sequential decision making problems under uncertainty with multiple competing decision makers can be modeled as stochastic games. It can be considered that the stochastic games are multiplayer extensions of Markov decision processes (MDPs). In this paper, we develop a reinforcement learning algorithm to obtain average reward equilibrium for irreducible stochastic games. In our ...

متن کامل

Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning

Research in reinforcement learning (RL) has thus far concentrated on two optimality criteria: the discounted framework, which has been very well-studied, and the average-reward framework, in which interest is rapidly increasing. In this paper, we present a framework called sensitive discount optimality which ooers an elegant way of linking these two paradigms. Although sensitive discount optima...

متن کامل

Reinforcement Learning for Average Reward Zero-Sum Games

We consider Reinforcement Learning for average reward zerosum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the second on Q-learning for stochastic shortest path games. Convergence is proved using the ODE (Ordinary Differential Equation) method. We further discuss the case where not all the actions are played by the opponent with comparab...

متن کامل

Reward Ideas and Strategies

Reward system can be considered the center of our body-mind equilibrium: we have medial reward, leading life, reproduction, needs, and willings; if this is not adequatly satisfied, dopamine is converted by dopa-beta-hydroxilase into catecolamines, and habenula regulates the balance so that lateral reward risk to become prevalent, with activation of HPA axis (hypothalamus-pituitary gland-adrenal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Optimization Theory and Applications

سال: 1998

ISSN: 0022-3239,1573-2878

DOI: 10.1023/a:1022697100194